Overview corpus
For this portfolio I have chosen to compare two of my favorite composers: Joe Hisaishi and Hans Zimmer. They are both known for their scores in the film industry with their own unique styles. I’m interested in how they differ in the way they compose and how this can be explained by the geographic region, as Joe Hisaishi mostly works with Japanese films and Hans Zimmer with his career in Hollywood. I expect to see that the tracks composed by Joe Hisaishi would be more happier in nature compared with the overall mood from Hans’ tracks, as the few tracks I known by him are all very happy and energetic, while Hans’s tracks gave me more a feeling of epicness and solemn.
Strength and limitations
The way I compare these two composers is by means of the Spotify feature ‘get artist audio features’ where I can get all the tracks from both composers. In total there are 1976 tracks available from Hans Zimmer and 1577 tracks from Joe Hisaishi. Mostly I can analyze all available tracks in for example a scatterplot and the difference between the number of track would not be a big issue because I’m only interested in the underlying structure. However for the classification analysis (under tab Decision Tree ) I choose to train the model on the first twenty tracks from both composers ranked from popularity to speed things up. This could potentially introduce some biases as the training data is quite small here. Although I believe the trained model still can give us some insight into the features that could describe both composers.
Short introduction of the two composers
As Joe Hisaishi is one of the most famous Japanese composer, known for over 100 film scores and the association with the animator Hayao Miyazaki, he has provide many composition which has worldwide success, including the opening theme of Miyazaki’s film Spirited Away and later on the score of Howl’s Moving Castle. His composingstyle can be described as gentle, minimalistic and melancholic, which also becomes a trademark of much of the studio Ghibli’s output.
Hans Zimmer is a German film score composer and producer. His work are notable for integrating electronic sounds with traditional orchestral arrangements. He has many award winning compositions include The Lion King and the Pirates of the Caribbean series.
Left: Hans Zimmer, Right: Joe Hisaichi
According to Spotify API, the feature ‘energy’ represents a perceptual measure of intensity and activity. As shown in the violinplot, both artists have a relatively low energy in terms of intensity and activity of their audio features. The graph indicates that both artists have a pretty similar energy level in terms of their work looking at the shape of the violin, this is also supported by the fact that they both have an average energy level at 0.21 (indicated by the red dot). A low energy level means that the overall track feels slow, less loud and less noisy. To have a better understanding in the meaning of the feature ‘energy’, we introduce another feature ‘valence’ in the next graph.
In this graph valence is plotted against the energy of both artists. Each point indicates an album of artist. As we can see from the scatterplot, the density is a largest in the left-under area where both energy and valence is low. Hans Zimmer’s work mostly has a very low valence level(0-0.125) and a low to medium energy level(0-0.50), while Joe Hisaishi’s work has a more even contribution across the left-under section. From both graph we can conclude that both artist have a pretty similar composing style where the valence and the energy is low, i.e. the soundtracks sound pretty sad according to the music theory (- valence, - energy).
As mentioned before, Hans and Joe have pretty similar composing styles according to Spotify API. I’m curious to see where the difference lies when I compare two soundtracks of them where the energy level and the valence level is the same. I’ll compare Hans Zimmer’s soundtrack ‘Toupee Or Not- Toupee’ with Joe Hisaishi’s soundtrack ‘The Dorok Army Stikes Back’ (both have an energy of 0.73 and valence of 0.97).
As shown in the chromagram, this soundtrack consist of the use of various keys, pitches with the most energy are mostly in the range of F# to A. Also pitch B is commonly used. The green block (pitch B) at the end of the timeline could be explained as the soundtrack is ended with a crash cymbal.
Compared to the previous chromagram of Zimmer’s soundtrack, Hisaishi’s soundtrack has the most energy in pitch G over time. Differ from Zimmer, this soundtrack pays little attention to pitch B although it’s easier to see what they do have in common than what they don’t.
This cepstrogram is based on Hans Zimmer’s soundtrack ‘Time’ which he provide for the ending sequence of the film ‘Inception’. The structure of this track can be seen in the cepstrogram in which c01 is the Spotify timbre component explaining the loudness, c02 the amount of energy in the lower frequency and c03 the amount of energy in the midrange frequency of a given timeframe. It is clear that the loudness of the first half (up to approx. 120 sec) is slowing building up, with instruments playing like piano, cello and tuba, providing energy in the low and midrange frequencies. As the loudness builds up, the intensity of the track also builds up, then from 120 sec to approximately 220 sec the climax is reached, which is also the part with intertwining melodies that we are all so familiar to. In this part layers of instruments take part, mimicking the narrative of the film where the emotion of the protagonist reached the top as he thinks about his wife back in limbo. This could also explain the energy in more abstract layers in the cepstrogram (from c04 and up) as more instruments intertwining with each other. Afterwards the loudness drops as a sign of fade out while instruments with low frequencies still plays (as seen in the last part of c02).
This keygram is based on the same track ‘Time’ by Hans Zimmer as it is also being analyzed using cepstrogram from previous time. I was wondering if analyzing this track using keygram could give me more insight into the overall structure of this iconic track.
First of all, we need to dive a bit deeper into the keygram itself. The keygram indicates the similarity values between the chroma vectors and the given chord templates/key template. Here key template is used and keys can be estimated by the keygram. As indicated by the generated keygram, it is interesting that in the first section (until approximate 25s) all the keys are marked with yellow, this also applies to the last section of the track (from approximate 280s and onward). This means there is a greater certainty to estimate keys in the beginning and the end of the track, according to the key templates. A possible explanation could be that this track is partly being synthesized (the middle part), as Hans Zimmer is known for integrating electronic music sounds with more traditional arrangements. This way Spotify API could have a difficult time estimating the key profiles when the sound is synthesized and thus not have a clear structure.
Generally, still some repeating patterns are found when looking at the keygram. It can be concluded from the parts that are clearly visible, no specific keys are preferred according to Spotify, as all keys across the spectrum are more or less being recognized. This confirms the intuition that Spotify is not able to recognize the key in this track, as it is known that this famous track is in the key of G.
Again the track ‘Time’ is being estimated here, now the tempo is our main focus. For the entire track, a novelty function(above) and a tempogram(beneath) are being generated. The novelty function indicates the start of an onset which can be seen as the spikes in the graph. The irregular patterns shows that the overall tempo is pretty rough to estimate. The same was shown in the tempogram, ideally I would like to see some regular patterns within certain tempo range, but this is not the case. With some searching on Google I found out that the tempo of this track is around 60 BPM which is quite low (fun fact: this means exactly once per second, which is also how fast time past!), however this isn’t clearly indicated by the graph itself. Again Spotify API has difficulty estimating low-level representations when it comes to a track that is partially being synthesized and also has integrated a great range of classical instruments.
Here I compare the tempo of the most popular 25 tracks from both composers. Here the mean tempo is plotted against the standard deviation of tempo, with size indicating the duration and opacity indicating the volume.
At the first glance it seems that the majority of the track have a mean tempo around 75-90 BPM. Overall there is a quite large difference between the individual tracks as well as between the two groups of tracks. It can be concluded that when we look at the most popular tracks, they do not follow a standard pattern in the sense of the tempo variation as well as duration and the volume used. This is interesting because the two composers are able to perform different styles of tracks and these tracks are also appreciated by the audience.
Here I build a decision tree based on two playlists from both the composers. I’m interested in how well these decision tree is able to classify the track, in the sense that to whom the track belongs to. Beside this, I’m also interested in which features are the most important in the process of classification. The results are shown here.
First of all, I’m surprised with how well these decision tree is able to get the correct result. Although the accuracy will vary (slightly) from every run, the precision and recall are steadily around 90 percent, which is very high for a classification algorithm.
From the results of random forest it is clear that timbre component 4 and timbre component 6 are the most important feature here to distinguish whether a tracks belongs to Joe or Hans. Also the feature ‘acousticness’ is prominent. It seems that Joe’s playlist is high on timbre component 4, as well as timbre component 6 (yellow dots in the graph), the opposite of Hans’ playlist. This is interesting as these two timbre component are more abstract and hard to explain than the lower timbre components. It’s hard to tell if we can rely on these components as the algorithm do, because we don’t have meaningful explanations for these components.
Thoughts about the whole proces
When I first started the course and heard that we must come up with our own corpus, I was immediately drawn to the idea of comparing my two favorite composers Hans Zimmer and Joe Hisaishi. However I have no clue how this comparison will be executed during the course, especially by means of Spotify API. Now by the end of the process, I look back with a lot of proud and joy. I have learned a lot about data visualization and extracting meaningful Spotify features of my interest.
What have I learned from my corpus
As mentioned in the introduction, I was expecting to see that the tracks by Joe Hisaishi would be more happier in nature compared to tracks by Hans Zimmer according to my own experiences. However Spotify API tells me that they have pretty similar styles (low energy, low tempo and a bit sad). I was also able to analyze the iconic track ‘Time’ by it’s timbre- and chroma features. The results I got was diverse, sometimes Spotify yields meaningful feature visualizations (like cepstrogram) and sometimes it does not (like by tempo estimation). Although it could also be that I overlooked some errors in code. Finally I’m surprised by the positive result of the classification algorithm. Apparently the algorithm sees a significant difference between the tracks by Hans and the tracks by Joe when you look at timbre components 4 and 6. This is an interesting finding in comparison to the conclusions we drawn before, where Hans and Joe have much in common.